- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.4k
docs: update server streaming mode documentation #9519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Updated the streaming mode example script with split data handling, which has been tested with these unit tests: Avoided using Node.js  | 
| Btw have you been able to test it with latest version on  | 
| 
 It seems like #9459 only added the  | 
| Example script still works with b4291 (ce8784b), but changed  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I think it's not a good idea to add this to our documentation. Because we already follow SSE standard (except for the POST method), client code should be trivial to implement.
The documentation should be reserved for things that can only be found in llama.cpp and not on the internet.
In this case, the code you provided is the same as openai implementation (because they also use SSE+POST method), there are many libraries on npm that can handle this (for example, this). So adding this here brings no more additional info to the docs, while adding maintenance cost in the future.
| 
 Removed example code. | 
Provide more documentation for streaming mode.
| Suggestions implemented. | 
Provide more documentation for streaming mode.
Provide more documentation for streaming mode.
Provide more documentation for streaming mode.
Server documentation:
n_predictin existing non-streamed example script (because on some computers 512 tokens can take a long time)